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A HOnCEflTRAL ANALYSIS OK VARIAf^CE MODEL - 
RELATING STATISTICAL AND PRACTICAL SIGNIFICAKCE 

Introduction 

Statement of the Problem 

One of tne most widely used nethods of analyzing research data in 
the behavioral sciences is the analysis of variance (ANOVA) , particularly 
the fixed effects model (Morrison & Henkel , 1969), Integrally tied in 
with this model is the idea of hypothesis testing in the form of tests 
of statistical significance* Of three types of statistical inference-- 
point estimation, interval estimation, and hypothesis testing--behav1oral 
scientists have devoted themselves aliwost exclusively to hypothesis test- 
ing (Heermann & Brasjcamp, 1970). 

Several writers have criticized the current use of ANOVA (Selvin, 
1957; DuBois, 1965; Bakan, 1966; Lykken, 1963; Flelss. 1969; Overall. 
1969). Other writers have suggested that with appropriate corrective 
steps, the basic ANOVA model is an exemplary method of analyzing data and 
obtaining meaningful results (Horst, 1967; Kempthome i Doerfler. 1969; 
Winch & Campbell. 1969). 

Some critics have argued that tests of significance, es done in 
ANOVA. essentially should not be used (e.g. . Morrison i Henkel, 1970); 
however, the pervasive influence of tradition has been recognized 
(Sterling. 1959; Rozeboom, 1960; Lykken, 1968; Heermann i Braskamp, 1970) 
More recently Walker and Schaffarzick (1974) while reluctantly using the 
criterion of statistical significance to compare studies, expressed the 
hope for an Improved methodology. 



It would seem valuable to modify the ANOVA model such that some 
inherent weaknesses including those discussed by the aforementioned 
critics^ are over^ccsne. In particular, it would be desirable to relate 
practical significance mere closely to statistical significance. 

The notion of practical significance is complex in and of itself. 
There is no comnonly accepted method of determining practicality. In 
educational research, where outcomes are not easily described In cost- 
benefit tenns, it is often quite difficult to decide if a difference due 
to treatment is of .educational or practical significance. Nevertneless , 
such assessments of practical significance are being made, and the 
current ANOVA model does not adeqi?ately handle the issue of practical 
significance. 

An analysis of variance model, based on the noncentral F distribution, 
is presented in this paper as an attempt to improve upon the currently 
used ANOVA model, in particular in the area the Inadequate handling 
of practical significance. 
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Review of Related Re s e'^rch 

Criticisms of Significance Testing 

The literature critical of significance testing has appeareJ mainly 
in the past 15-17 years (Morrison & Henkel , 1970). Periodically researchers 
have been reminded that statistical significance does not necessarily 
irr^ly practical significance (Selvin, 1957; DuBoi:^ , 1965; Mendenhall , 
1968; Glass & Hakstian, 1969). In essence, what this warning says for 
the AHOVA case is that F tests with their associated p (for probability) 
level of significance are not sufficient means for assessing results. 
Nevertheless, reviewers sometimes use only significance levels when com^ 
paring results from several studies (e.g., Eysenck, lv*60; Bracht, 1970). 

Other authors have treated significant F values as implying sizable 

differences (Guilford, 1956; Mendenhall, 1968). Guilford (1956, p. 275) 

described the ANOVA results of a study: 

The F ratio for machines is significant beyond the .01 point, 
leaving us with considerable confidence that the machine 
differences, as such, have a real bearing upon the difficulty 
of the task. 

Strictly speaking, such a significant F could have resulted where the 
differences were trivial (in the practical sense). The following theorem 
proves that for any predetermined (small) number, a statistically signifi- 
cant 'F (for J = 2) or t ratio can be obtained, but such that the differences 
due to treatment are less than that prectetehnined number. 
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Theorem: For any c > o, and o <|7i-X2|<c, there exists an 
such that if sample size N > N^, then t is statistically signi- 
ficant for an ordinary t-test. (In layman's terminology: With 
a large enough sample size, statistical significance is obtain- 

* 

ft>le no matter how trivial the difference in means Is.) 

Proof ; Let c > 0 be given t*ith \J^ - Jm\ *^c% jJJfthqut loss 



of generality, assume: S] « " 5 (Hooogerielty 

ance satisfied), and n^ » ng ■ n (equal'^ell. sf46 

.*.N " + n2 ■ 2n 

Then |t| - (1^1 - XgD/CsCa/n)'*) 

- (n^d^i - X2|5/(s(2)'«) 

Require that n^ > 10s(2*«)/(|I, - 

-> n>200s2/|ir| 'J2\^ 

Therefore, if N - 2n > 400s2/|T-, - Ygl^ 

. • lQs2^ 1^1 - "^21 

tl '• — • — 10 

' ' 1^1 -Xil s/2 

t Is statistically significant - 




then 



Q. E. D. 



Because of the reliance on statistical significance In current 
research niiithodology, a misleaditjg picture appears Iri the litera- 
ture. A classical example was described by Oakan (1966)* Suppose 
Hq (the null hypothesis) is true in the population. Accordingly 
if tests of significance are carried out by lUO independent re- 
sea>j;chers, those 95 (approximately) who do not get statistical 
cignifkance probably vfill not bother to publish their findings. 
The five who do attain statistical significance will be more in- 
clined to publish their findings and they will make Type I errors* 

Part of the misinterpretation of p values is due to a misunder- 
standing of what the probabilities relate to. Camilleri (1962) 
defined three types of probability: (1) Intrins ic probability 
between population variables (e.g. in^a population of scores what 
is the probability, of a score being greater than one population 
standard deviation above the population mean), (2) auxiliary prob- 
ability between a sample and a population (e.g. maximum likelihood 
estimates), and (3) inductive probability relating to the probable 
validity of a hypothesis; that is. scientific Inference. He 
asserted that significance tests have been used for assessiriQ in- 
ductive probability when really they are more appropriate for 
auxiliary probability. 

Morrison & Henkel (1969) argued persuasively that, statisti- 
cally speaking, roost research doe*> not qualify from the standpoint 
of legitimate use of significance tests. They presented the 
following paradigm: 



Type of Population Sampled 

Type of Sampling Technique Specified Unspecified 

Prqbability A B 

Nonprobabili ty C 0 

Thel only legitimate use of significance tests is with studies in Category A. 

Several writers have critii^ized the all-or-none method involved with 

sigr^ificance testing (Rozeboom, 1950; Bakan. 1966; Meehl. 1967). Science 

progresses by adjustments of degree of belief rather than firm decisions, 

Bakan feels that the tests have little if anything to contribute to 

scientific inference. He does agree with Rozeboom that the tests are 

appropriate for making null hypothesis decisions. 

R. A. Fisher (1959 » p, 44) issued a caution about the interpretation 

of significance levels: 

They (tjsts of significance) do not generally lead to any 
probabilUy statements about the real world, but to a 
rational and well-defined measure of reluctance to the 
acceptance of the hypotheses they test 

Accepting the Null Hypothesis 

Some writers have advocated the accepting of H-. If a significant 
Sta.t1st1c is not observed (Walker 4 Lev, 1953; Guilford, 1956; Guenther, 
1964; Kirk, 1968; Glass & Stanley, 1970). Yet statisticians often have 
wamecf against such practices unless the power (probability of rejecting 

when the alternative hypothesis is true) is known (Berkson, 1942; 
Peatman, 1953; Mendenhall, 1968). Cohen (1969), however, has shown 
that ^n typical psychological research, f^er of greater than .90 would 
require larger samples than are usually available. 

To emphasize the inappropriateness of accepting Hq without knowing 
the power,. the proof of a simple theorem is presented whidh states that 



for a given level of significance, there exist normal distributions 
sucfi that the F or t statistic will not be significant, but the size 
^ of the effects will be larger than any predetermined number. 

THEOREM : There exist distributions satisfying the ANOVA assump- 
lions such that the null hypothesis is not rejected, but the means 
differ by more than any pre-given number. (In layman's terminol- 
ogy: If, upon a na.n-s1gnlficant test statistic, you.^aqcept Hq, 
then you may be calUng a huge difference a "zero difference.") 
Proof : (2-sample case) Let e > o be given. Re<ju1re that 
jSTi - jTgl > c. without loss of generality, assume s-j = $3 s 
and « n^ ■ n. 

Then t - (IT^ - \)/%{Ht\)^ - t\\J^ - 

Let s » (7^ - • n^ 

Then t - (n^di - \)/(/2\%^ - • n^) 

« ± .707 (not significant) 
E. d! 

This proof indicates that a researche^^ho accepts Hq may be calling an 
essentially infinite difference a "zero difference." McNemar's (1962) 
suggestion of using three regions (acceptance, suspended Judgment, 2nd 
rejection), depending on the size of the p, does not overcome this objection. 

Conventional Rejection Levels 

The subserviience to using conventional levels (e.g., .01 or .05) 
was criticized over 30 years ago along with the vei7 phrasing of "test 
of significance." (Snedecor, 1942). Uesoite more current warnings about 



8 



the reaciy acceptance of conventional significance levp^s (McNemar, 1962; ' 
Winer, 1962; Slough, 1963; Skipper, Guenther, & Nass, i967; Labovitz. 
1968), the American Psychological Association Publication Manual (V 37) 
advocates the use of asterisks to indicate the various conventional levels. 
OuBois (1968) has cotitended that conventional levels promote objectivity. 

Rosenthal and Gaito reported a "cliff" effect where researchers 
showed a greatest loss of confidence between p=.05 and p«JO (Rosenthal 
& Gaito. 1963). However, a subsequent replication did net find this 
effect (Beauchamp & Maiy, 1964). Both studies noted that students and 
faculty in the field of p *'chological research expressed more confidence 
(degree of belief in research findings) in the sara p values based on ^ 
100 than on 10 cases in the sample, despite the fact that this nieant 
that the smaller sample usually exhibited a larger difference. 

Estimating Sample Size 

Much attention has been devoted to tha estimation of sample size 
with regard to detecting differences as statistically significant. 
Winer (1962) and Cohen (1969) produced tables that are difficult to use. 
A simpler method was presented by Overall and Dalai (1968). Most of 
the ^f,riters in this area have empnasized sample size or power with 
regard to obtaining statistical significance rather than sharpening of 
estimates. For example, Cohen (1969) defined "power" as the probability \ 
that an investigation would lead to statistically significant results. ~ 
Such a definition implies that the purpose of increasing power is to 
increase the probability of obtaining statis'^ical significance. It 
a. so means that absurdities logically follow; for example, a larger 
sample is sometimes deemed less desirable than a smaller one without any 
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mention of cost effectiveness (Hays, 1963). If, however, power is 
defined in ^erms of sharpness of estimates rather than ability to detect 
differences as statistically significant; then such an anomaly does not 
arise, for in this case, the larger the -Sample, the better the estimate. 
Furthermore, if the statistical level of significance is related to the 
level of practical significance, then the importance of sample size is 
placed in proper perspective^v^^^^Since a zero null hypothesis can always 
be rejected with a large enough sarrsple under the ordinary ANOVA model 
(see page 4), the decision making depends more on N than on the 
estimation of the parameters. The noncentral ANOVA that is proposed in 
this paper will result in a closer relationship^between estimation and 
decision making. , . 

Confidence Intervals 

' — • . 

The most commonly accepted method of treating practical significance 

statistically is the use of confidence intervals around linear comWjS'- i 

ations of means* The^e .confidenc^intervals are described in most 

educational statistics books, but are offered more as options than as 

reciOTnended and expected procedures (Guilford, 1956f HcNemar, 1962). 

Nevertheless, several jsdicational resje^rcheri are beginning to use con- 

fidence interval procedures— in particular, the post-hoc methods of Tukey 

and Scheff^ (Scheff/, 1959). These post-hoc procedures control the 

Type I error (rejecting a null hypothesis when, in fact, it is true) .for 

all contrasts, and as such are a definite impro^^^t oytM^^^ 

of con^)uting several t- tests v- However,..si^ post-tji^c'' contrasts ^^rf'^^^^'^'^^l^'^''"' 
computed after a statistically si^ificant F test without regai^/or - ^ 
practTcal significarice", much effort maj^^^^ent on putting bounds around 
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trivial results. What is neeUed is an F-type test which is significant 
in the statistical sense only if the results are also of practical 
s i gnifi can ceT 

Another confidence Interval approach has been advocated by Rosenkrantz: 

- ■ ■ • ^ 

(1972). The use of "direct confidence intervals" can promote the control 
of the probability of indecisive results and thus provide opportunity 



to stucty the weaknesses of the model under consideration. 
Measures of Association 



Another accepted means of attempting to relate practical signifi- 
cance with statistical significance is the use of measures of association 
like a> (Hays^ 1963), which represent the percent of variance explained 
(Nurfnally, 1960; Duggan & Dean, 1968). These measures, unlike thc^^^p,. « 
values associated with the F test, are relatively independerjt -of sample 
size (Kanried>'. 1970). 



Summary of Related itesearctj^^ 



Teg^^l^ statf5tic"%l=^ignificance, as commonly used, have been 

frequenftly cri-ticized. Suggested im{3troVements have also met with triU 

■ ... - .... , ■ ■■ . . -'^^r / 

particularly because of the contlnue'd lack of relationship between 
statistical' and practical significance. Well known writers have advo- 
cfted inappropriate nj^th^oj 00^^^ ConfJdfe'nce Thtervals and measuV^s of 

%vy^ays of as^^sing practical significance- 
;pl^test that relates statistical ind 




aiis^j&a a t1 cfe,:wer6 seezi 



t\a*Vl s a 1 so ne^de d 



practical srig^ficiai^ 
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Noncentral Analysis of Variance 

Related Research and Theory 

ANOVA's peculiar characteristic of sometimes resulting in statistical 
but not practical significance leads to the following Situation: a 
statistical rejection of the null hypothesis can coincide with (1) a 
practical difference, or (2) ^ trivial difference. 

Some researchers regard levels of significance as indications of 
the degree of certainty in the results. This certainty, however, refers 
to the probability that the true difference is not exactly zero. It 
does not refer to the size of the difference. With a large enough sample, 
a difference can be mihiscule and yet the p value could easily imply 
that it is- very certain that the true difference is not exactly zero. 
Consider also the following example: 

= 7, n^ = 2, = 2, ng - 2. Sp^^^^^ = 1. 



Then t = - l2^/S a ^^^^^ ^ 



^ -s. 



if the low p value (p < .05). obtained is used as a measure of certainty, 
then this example shows a case where one would be "certain" about the 
results based on a sample of only four subjects. 

The most cormionly accepted solution to the statistical-practical 
significance dilenma seems to be one of first ascertaining statistical 
signific$?^ce and second assessing the practical significance of any 
statistically significant results. In essen_ce, the statistical test 
doesr not necessarily match up with the practical one. 

Since practicality often is assessed pos^ hoc (after the statistical 
test), it Is reasonable to ask for an a priori [before the test) 
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assessment so that a more appropriate null hypothesis can be used. The 
unquestioning acceptance of always using a zero difference null hypothesis 
has been criticized by several writers (Grant, 1962; Kerlinger, 1964; 
Cohen, 1969). 

For the two sample t-test, it has been suggested (Dixon & Massey, 
1969; Pena, 1970) that if d represents a practical difference, then the 
test statistic is . 

t - (^1 - \)-<i 

In this case the use of the ordinary t statistic would amount to asking 
the wrong question. Instead of asking whether there is a difference at 
all, researchers usually should^be asking whether or not there Is an 
educational or practical difference. Instead of asking whether a Datsun 
gets better mileage than a Cadillac, we should be asking how many more 
gallons a Datsun gets and whether this difference is of practical impor- 
tance. 

Using Dixon and Massey 's model, if a researcher obtains a statisti- 
cally significant difference then it will also be of practical significance 
(i.e., greater than the preassigned value of d) . Basically this procedure 
results in the test of the appropriate (non-zero) null hypothesis. 

As indicated earlier, analysis of variance needs a similar procedure 
since trivial differences may be statistically significant, and Tukey's 
or Scheffe'*5 confidence interval procedures (Scheffe"^ 1959) would merely 
be putting bounds around trivial differences. Fortunately, the noncentral 
parameter'^^'af^e noncentral F distribution provides an analog to the d 
used in the t statistic just described. Again if the minimum practical 
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difference Is greater than zero, then use of the ordinary F test amounts 
to asking the wrong question. 

Once a researcher has determined what constitutes a practical 
difference, then the next problem is to associate this difference with 
the appropriate noncentrality parameter. If this 6 is correctly deter- 
mined, then the new model guarantees that statistical significance will 
be related to practi^l significance. The Influerirce of sampU ss|^e on 
the F value Is no lOTger a problem^since as H increases, 6 also Increases 
in such a way that the critical F value Is automatically adjusted upwards 
to compensate for the Increase In the F statistic due to the larger 
sample size. 

Estimating the Noncentrality Parameter Associated with a Practical 
Difference 

Kirk (196a} defined the ncncentrality parameter' as 

■ . Where J « nuraber of treatments 

nj = number of subjects in the jth treatment 

• mean. of the jth treatment 
11 » grand mean , . 
ag^ <= error variance 

The noncer^trality parameter expresses the size of the effects in 
terms of the sdifferences between the various group means and the grand 
mean. When one speaks ;-0f practical differences, he is usually referrihg 
to the differences between treatments rather than the difference between 
each treatment and the overall mean. Of course, if a^^researcher-^ould 
relate what he considers a pract^cafidl/ference with the squared ' 

Vang's (193a| classic on power defined the noncentrality parajtieter as 
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differences between the group means and the grand mean, then he could 
directly substitute into the formula for 6. and thus determine the non- 
central ity parameter associated with a practical difference. 

Since, however, differences among treatments is the more conmon 
approach, these differences will be related to 6 by the following 
theorem : 




Gini (umiated) proved a similar theorem for the relationship 



where J » ntniber of treatments, Jii « mean of 1th treatment, 
= /II •■ ^ - , where jui.. » ( ^ M-'*) / T. 



Proof : 

--(/<r — ^"t — ; + . . . W/^r - — T — y 

|2' !»• ,>j ' ' ' 

* ^ • - 



y 1 J" t 



- fx/- 2^,><. t (yM^'ZM.^^^^l)-*...-!- 



Q. E. D. 
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The main purpose of this theorem is to enable the researcher 
to relate practical differences, which can be expressed in terms 
of (mj - to which is a function of (wj - p.J^. It woulo 
be difficult If the researcher had to translate practical differ- 
ences in terins of (mj m..). 

Given K treatments let M « the cniniimjm practical average 
difference between treatments, expressed in tenns of absolute 
values. 

^ I I average of pairwise differ- 

— 7T\ ^"^^ (absolute values) 

(i) 

Consider the case where^ cell sample sizes are^ equal. Since 
J* r f h .£ ^j^)/ ^ and the previous theo^^cn showed that 

then a reasonable tr^al substitution for j|j is M 

The square of the average of the pairwise differences is to be 

substituted for the average of the (differences)^. So we have 

For J « 3, we have ^ « ^ r 

Besides using the mimmum practical average M, 'it is also 
possible to substitute the individual niinimun pairwise differences 
if these can be stated by the researcher — In addition, orthogonal 
a-pr1ori contrast^ may be performed with a 6 being determined by 
the particular contrast. Post-hoc contrasts like Scheff^'s can 
still be used as currently practiced since they control the Type I 
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error for all contrasts, independent of the truth of the null hypothesis 
(Scheffe^ 1959). 

Let R = « ■ a measure of how good an approximation results from 

using in place of % ^\ ^ using several sample sizes, means, 

J.I 

and variances, a computer program was used to compute several such ratios. 
R turns out to be a function of the relative distances between the means; 
for example, the lowest ratio (and therefore, the best estimate) Wc^s 
obtained when the n^ans were equally distant (e.g., vi * 7» * ^3 * 
The worst estimate occurred when two means were as far away frm the third 
means as possible (e.g., y] » 0, 113 M3 « 15). In between, the 

ratio was exactly determined by the variable e = ja - tj|/(a ^ b) where 
a « Im] - P2I f b » ^3! ^1 1 ^2 - ^3' Accordingly 6 can be 

readily determined by multiplying I by R. 
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Figure 1: R in Tenns of Position of Means 



« >K ^ > | 

e » U-b|/(a+b) 

H 1 1 



Special cases: 

1) e « 0 »> R » 1.13 (best estimate) 

« h ""''^ 

' H ^ M e « |a-b|/(a+b) 

= 0/(a+b) " 0 

-H —I j — 

M5 



11) e » 1 -> R » 1.5 (worst estimate) 



^ e » |a-b|/(a+b) 

= a/a 
= 1 



H — 

Mi 



Figure 2: R » 52/32 a Function of 
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Perforainq the Noncentral F Test * 
Once the a-priori practical difference is used to determine 

«o, an ordinary F test is perfonoed to test the hypotheses: 
Hq : 6^ <^ 62 (there is no practical difference) 
H] : 6^ > j2 (there is a practical difference) 
Instead of rejecting Hq when the observed F > F^^, {)-<i), 

now Hq is rejected when the observed F > F'v^ . *o 

noncentral F cutoff point. 

Patnaik's Approximation of Noncentral F 

The noncentral f distribution (denoted by F') has been tabled 
only partially (Johnson S Wei th, 1939; Barton. David, & O'Neill, 
1960; Severe & Zelen, 1960; Tiku, 1966). Unlike Central F, F' 
cannot, in general, be expressed in closed fom (Wishart, 1932; 
Price, 1964K A r*casonably complete F' table would probably be 
too unwieldy for practical use (there are 389 pages in Resnikoff 
and Liebennan's Tables of the HQf>-central t-distribution . 1957). 
Of the several approximation procedures developed, Patnaik's (1949) 
seems the most usable since it utilizes the already available and 
familiar central F tables. 

Although Patnaik's method involves laborious computation 
(Feldt & Malmoud, 1958; Grubbs, Coon, & Pearson, 1966), the result- 
ing formulas for the fixed effects ANOVA #ase are relatively simple 
The accuracy of Patnaik's approximation has been verified in sev- 
eral studies (Pearson, 1952; Tukey, 1957; Sankaran, 1963; Seber, 
1963). A brief outline of the method appears in the Appendix to 
Scheffe's The Analysis of Variance (1959). 
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Derivation of Patn&lk's Approximation for AWOVA ^ 
It can shown that E (Xy,s) s ifti &n<i variance 

£ ^ ^ cT . where /C » noncentra l 

"J^ Vflth noncentrality parameter SAfini^]/frl (Scheffe, 1959)'' 

A possible approximation of X*', ^ \% c,lL ;7. 

Equating means and variances of the two distributions, we get 

cVct/tS^ (since and cV^/^Jr Zi/-^ f / 

=^ c^u 1/ 4- I since var (X?) • Solve for c and V . 



/(c-hO/Cc-i) = Zif -t 

Noncentral F can be considered as (U^/v^ )/ (02/^2) **here is A*/,,/ 
and U2 is 

where M ^ TCJ- iW^-^ 



Scheffe's problea IV.4 Ws an error In it. The expression , 
Pr (i i (T^'^Xl-^jSj'^ read Pr it£(%'^)(ii- ^) 
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With the formulas for and Fvi* v^* 6 a noncentral analysis 
of variance can now be performed. In the following Illustrative 
exanple, notice that the observed F statistic would be significant 
for an ordinary AWOVA. 

Illustrative Exaaple 
Noncentral ANOVA: 
d-3 N-60 n-20 a^^'ZS 

1. Researcher states that the average difference between pairs 
of treatment must be greater than 10 in order for there to 
be a practical slgnificancei 

A 

2. The sample means are 38.9, 51.6, 53.3; F « 47.8 



3. 



(iTi) K 

Since F < 93.5, do not reject "Hq: There is no practical differ- 
ence." 

Monte Carlo Test of Noncentral ANOVA 
Rationale ; 

I, 

The CAL DEVIATE computer program (Hutchinson, 1967) was used 
to generate pseudorandom samples from three normal populations 



Z3 



J'' 



idth the following paraBieters: « 40, ^2 " ^> ^3 " 
0^2 „ , , 25; » « » 20. The Jtonte Carlo test 
for noncentral ANOVA is based onVthe following rationale: 
(1) Suppose a researcher has been able;^^ a minimum 



average prattled 1 diffeVence amon^v:^|^$^^^ groUps, Aji£t|jpd 
described earlier relates this functlonaTly to 6^ where the 
researcher wants a statfstlcally significant result to imply that 
the average dlfferejices among^ groups are suqh that > «q. (2). If 
the populations are set up such that t^^Bpulation 6 « 6q, then 
a verification of the model would require that for a Type I error 
rate of a» 100a% of the time the observed F would exceed the > 



tabled or compute<lnoncentral F.. Figure 3 presents pictorial ly 



what is happening y») 




Figure 

Rejection Region for Noncentral ANOVA 



Reject Hq': 6^ » 
if the observed F 
is greater than 
F"v^, vg, 60 (1 - a) 




Notice that if the ordinary ANOVA were used, a statistically 
significant result wuld occur i^hinwre often than lOOaS of the 
time even though the true differences are not sufficient to be of 



One hundred analyses of variance were run using the BHDOIV 



meeting the criterion of having practical difference. In ANOVA 
'^liypothesis testing language: 

Hq! «^ 1 there is no practical difference 
H^: 6^ > fi^o there is a. practical difference 
where ^^pop ^o^- Since J (sample mean) is an unbiased estimate 
of r (population mean), the grand means of the three entire 
samples generatedare used in calculating «^pop* Each repre- 



sents the mean of all the data generated f rom theyth population. 
« 39.95, « 50.44, C3 » 54,99 ^"v/ 



practical significance. 



Procedure; 



program. The population is set up so that ft barely misses 




y.. » 48.46 « grand mean of all the samples combined 
a] « 6.53, a2 ^ 1.98, 03 «-B.51 {'^^ « Mi - 

« 42.64, 3.92, 03^ ^ 72.25 « ^ oc] = 118.81 
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These foraulas were derived earlier in this paper. 

p „ T ^ MxM , 4S.42. 0, , (48.42)2 

i-p I i- 2 ^ 2(48.42) - 1 ^^'^ 

F'v^. vg. i (1 - a) = (48.42) F24.5.57 (1 - a) 

« 1X)2.65 for B «"701 

= 82.31 for a ' .05 

= 73.11 for p = .10 
Since 100 samples were run, approximately 1 F value should 
exceed 102.65, about 5 should exceed 82.31, and about 10 should 
exceed 73 . 1 ! . 

Table 1 compares the expected with the actual number of 
F values exceeding the various cut-off points. 



/ Table 1: 

Suninary of Monte Carlo Test of Nonceritral ANOVA 

Expected number Actual number 

exceeding cut off exceeding cut off 

.01 1 0 

.05 5 8 

.10 10 11 



The misfit for a =• .05 is not as bad as it seems, since of 
the 8 exceeding 82.31, three were barely above that value (82.36, 
82.56» and 82.89). 

Sunmary 

The Monte Carlo test, in general, verified that the non- 
central ANOVA procedure is operating at near the appropriate 
Type I error rate. Notice that an ordinary ANOVA procedure would 
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have yielded 100 statistically significant results where the 
population has Imposed upon It the characteristic of no practical 
significance. 



Simmary and Discussion 

Some of the cocnnon Inappropria :e uses of the traditional analysis 
of variance and also the shortcomings inherent in the ANOVA model Itself 
have been described. The ANOVA model was modified to integrate practical 
significance with statistical significance. The modified version, based 
on the noncentral F distribution, included procedures for estimating the 
required noncentral ity parameter 5, given that the researcher can state 
a priori what constitutes a minimum practical difference among the group 
means. 

This proposed noncentral ANOVA would seem to be an improvement over 
ANOVA in several aspects: 

1. No longer can trivial (in the educational sense) results 
attain statistical significance.. Hence, the illogic of the concept 
of "too large a sample" does not exist apart from cost effectiveness. 

2. The researcher is forced to relate numerical scores with 
practicality instead of analyzing scores in and of themselves- 

3. Post-hoc contrasts (e.g., Scheffe'^, Tukey) are computed only 
around non-trivial (in the educational sense) ^results . 

4. A statistical rejection can no longer be followed by tivo 
contradictory outcomes. In ordinary ANOVA, statistical significance 
can go with (1) no practical significance or (2) a practical difference. 
With noncentral ANOVA, statistical significance goes only with practical 
significance because the appropriate hypothesis is being tested. 

If noncentral ANOVA becomes widely used,' it would be desirable 
to have easily used noncentral F tables where a researcher need only 
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specify v^, v^, and 6 to obtain the corresponding noncentral F value. 

The various partial tables now in existence are geared mainly for 

power calculations and not readily usable (e.g.. Tang, 1938; Cohen, 1969) 

The overall importance of the procedures pi^sented in this study 
is the bringing together of statistics and practicality. This synergism 
enables not only more meaningful presentation of results, but also the 
powerful use of statistics in a complementary rather than' ritualistic 
way. 

^The use of noncentral ANOVA can improve the quality of data analysis 

while at the same time be straightforward enough for understanding and 

use by practitioners. E. S. Pearson (1938, p. 471) aptly described the 

liT^ortance of noncomplex concepts for users: .^^ 

If the object of the mathematical statistician is to provide 
tools for practical use. It seems important that the connexion 
between the abstract and the perceptual should be expressible 
in terms of the simplest possible probability concepts. 

Noncentral ANOVA would seem to meet this criterion while at the same 

time provide a means of eliminating some of the crucial shortcomings 

of the currently used ANOVA model. 
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